Arguing against the Proposition is Geoffrey Ibbott, Ph.D. Dr. Ibbott is Professor and Chief of the Section of Outreach Physics at the UT M.D. Anderson Cancer Center in Houston. The section includes several programs known to many medical physicists, including the Accredited Dosimetry Calibration Laboratory, Radiation Dosimetry Services, and the Radiological Physics Center (RPC). As Director of the RPC, Dr. Ibbott has a particular interest in the quality assurance of cooperative group clinical trials. When not busy with his professional activities, Dr. Ibbott can be found ballroom dancing with his wife, Diane, or sailing in Galveston Bay. The accuracy and reproducibility of radiation therapy can be improved by standardizing and calibrating image segmentation methods. Segmentation is a commonly performed procedure that affects critical treatment planning and delivery decisions. Image-guided 3D and emerging four-dimensional (4D) planning and delivery methods require one or more user-created models of the patient to localize and display objects of interest, position beams, and shape beam apertures, compute DVHs and volume-weighted metrics, characterize temporal changes in patient anatomy, and transfer information from one or more reference images to inter- and intra-treatment images for accurate and reproducible targeting. The structures comprising the patient model are defined by segmenting volume images. Due to the large number of departments practicing image-guided planning and delivery, it is likely that segmentation is performed more often as a clinical procedure in radiation oncology than for all the other medical specialties combined. Calibrating a dose to a point in water has served well the two–dimensional (2D) and nonconformal 3D eras but is insufficient for the modern era, which is distinguished by tight margins and steep dose gradients intended to shrink-wrap the high-dose region around the target while conformally avoiding nearby normal tissues. The modern approach, particularly in conjunction with inverse planning methods, is exquisitely sensitive to geometric variations in the patient model. The quality of manual segmentation is degraded by user-specific systematic and random intra- and inter-user variabilities that in turn are manifested as suboptimal plans and imprecise targeting. Emerging automatic methods1–8 promise to significantly reduce random variabilities, leaving predominantly systematic errors that in principle are correctable by training, algorithm “tuning,” or supervised editing. While it would be impractical if not impossible to calibrate human segmentation, it is possible to standardize and calibrate automatic methods. Standardization would eliminate nonuniform in-house practices that confound comparison of clinical studies from different sites and impede accurate export and import of protocols. Calibration would assure compliance with an accepted standard. Current automatic methods produce approximate segmentations intended for supervised editing. As automatic methods improve and gain the confidence of users, and as the number of imaging procedures increases, it is likely that practice will shift toward minimal supervision and eventually to almost complete reliance on automation. Minimal supervision elevates the desired standard of performance and adds urgency to finding standardization and calibration solutions. Now is the time to begin working on the issues related to finding those solutions. One challenge to be faced is characterizing the performance of algorithms in clinically relevant terms. Performance characterization is of interest in computer vision and important questions are being addressed.9–11 Practical issues include agreeing on standard practices and developing calibration methods that can be widely implemented. The AAPM can take a leadership role by including focused sessions during annual meetings and forming task groups to study the major issues and make recommendations. Inaction assures that the promises of standardization and calibration will be lost. Standardization refers to developing protocols to encourage uniform segmentation practices. A protocol for the rectum might specify, for example, the length of rectum to be segmented, and whether the rectal wall and/or the anterior portion near the prostate should be segmented. Calibration refers to quantitating segmentation accuracy and reproducibility. Dr. Ibbott presents a clear picture of the professional and scientific challenges that must be addressed to find widely acceptable solutions for calibration and standardization of image segmentation for radiation therapy. Standardization of target volumes is particularly problematic and his point is well made that better understanding and consensus are needed. This argument applies to organs at risk (OAR) as well, but the imaging issues are not as complex. Also in the scientific arena, further research is needed to develop, validate, and calibrate segmentation methods. Traveling the road to understanding and consensus is a community venture that requires compiling and vetting information, education, airing ideas, and open debate. Professional and scientific organizations such as the AAPM can play an important role in this process. In nearly every aspect, standardization in radiation oncology is a desirable objective. It is widely agreed, for example, that treatment machine calibration must be traceable to national standards, to assure that dose delivery is consistent among departments.12 Data communication, particularly that involving radiological images and radiation therapy treatment plans, must be conducted according to protocols such as DICOM to ensure that the data are received intact. Seemingly obvious things such as the units and coordinate systems used for describing radiation beam size and orientation must be standardized to facilitate the accurate communication of these parameters.13 Several unfortunate accidents in radiation therapy have occurred because of misunderstandings of the coordinate system used to describe the placement of the radiation field, or confusion over the units of a calibration coefficient.14 With regard to the definition of target volumes and OAR in radiation therapy, it would seem that standardization would have many benefits. Defining target volumes according to a standard could streamline operations in a radiation therapy department, and might result in time and cost savings. Perhaps even more significantly, standardizing target volume definition processes could improve the quality of clinical trials by ensuring that patients receive equivalent care at multiple institutions. One day, we will probably reach this point. Today, however, may be too soon to develop standards for defining target volumes and OAR. Numerous studies have shown that physicians rarely agree on the shape and size of a target volume.15 Physicians involved in clinical trials suggest that inconsistent identification of target volumes is probably a greater cause of variations in patient treatment than is the implementation of new treatment technologies.16 There is no “gold standard” for any target volume or other structure. In fact, in a recently closed multi-institutional study of IMRT for treatment of the oropharynx, the principal investigator revised the contours drawn by physicians registering patients in the trial, to assure that dose-volume histograms were calculated according to his criteria.17 Another RTOG/NSABP trial that opened recently requires that participants demonstrate their willingness and ability to define the target volume according to the principal investigators’ criteria.18 It is unlikely that these PIs would have accepted a standards agency's definition of the targets and other structures. The expansion into radiation therapy planning of imaging technologies such as MR and PET is changing the way target volumes are defined. On the horizon are molecular imaging techniques that promise to allow routine identification of tumor-bearing tissue in organs such as the prostate, further improving the radiation oncologist's ability to define the target. In addition, the current interest in 4D imaging and treatment to accommodate respiratory motion raises new uncertainties in the identification of target volumes. Standardizing now could limit the development of these techniques and discourage new research. Until the identification of tumor volumes is better understood and consensus is achieved in the radiation therapy community, the standardization of target volume segmentation should remain a research project. In his opening statement, my opponent claims that standardization and calibration of segmentation methods can improve the accuracy and reproducibility of radiation therapy. It is hard for me to disagree with this claim. As one whose career has been intimately associated with the development of standards,13 and the supervision of calibration laboratories,12 I support both activities. But I question whether automatic segmentation can improve the quality of radiation therapy. Is it true, as it is with dosimetry, that complying with a consistent standard is more important than that the standard be correct? With regard to defining target volumes, being consistently in error is not acceptable. Removing random variabilities could be detrimental to patient care, if the wrong standard is chosen. Dr. Chaney says that calibration of segmentation techniques would assure compliance with an accepted standard. If only there were such a standard! The literature is filled with examples of the disagreements encountered when one asks several physicians to define target volumes15,19,20 There are also examples of the disagreements demonstrated when different imaging modalities, such as CT and MRI, are used for the segmentation procedure.21 The introduction of PET has improved the ability to identify the location and extent of lung tumors, leading radiologists to change the volumes delineated on CT images.22 My opponent points to a recent publication discussing the development and design of a segmentation algorithm.3 In a related paper, the algorithm is used to perform segmentation of kidneys from CT images.23 While the performance of the algorithm is impressive, the authors note that significant disagreements occurred between two experienced physicians who contoured the same kidney. The disagreement between the automatic segmentations and those drawn by humans was similar. It is not clear how well this algorithm might function with structures that are less clearly defined than the kidney, which is the case for many tumors. Dr. Chaney concludes that the challenges facing the implementation of segmentation algorithms include finding ways to characterize their performance, developing calibration methods, and agreeing on standard practices. Developing the consensus needed to address these issues could be supported by the AAPM. I wholeheartedly agree, and look forward to progress in this area. In the meantime, as was stated before, I believe automatic segmentation should remain in the research arena.